Towards Effective Entity Extraction of Scientific Documents using Discriminative Linguistic Features

Sangwon Hwang; Jang-Eui Hong; Young-Kwang Nam

연구문헌

영문 논문지

홈 > 연구문헌 > 영문 논문지 > TIIS (한국인터넷정보학회)

TIIS (한국인터넷정보학회)

Current Result Document :

한글제목(Korean Title)	Towards Effective Entity Extraction of Scientific Documents using Discriminative Linguistic Features
영문제목(English Title)	Towards Effective Entity Extraction of Scientific Documents using Discriminative Linguistic Features
저자(Author)	Sangwon Hwang Jang-Eui Hong Young-Kwang Nam
원문수록처(Citation)	VOL 13 NO. 03 PP. 1639 ~ 1658 (2019. 03)
한글내용 (Korean Abstract)
영문내용 (English Abstract)	Named entity recognition (NER) is an important technique for improving the performance of data mining and big data analytics. In previous studies, NER systems have been employed to identify named-entities using statistical methods based on prior information or linguistic features; however, such methods are limited in that they are unable to recognize unregistered or unlearned objects. In this paper, a method is proposed to extract objects, such as technologies, theories, or person names, by analyzing the collocation relationship between certain words that simultaneously appear around specific words in the abstracts of academic journals. The method is executed as follows. First, the data is preprocessed using data cleaning and sentence detection to separate the text into single sentences. Then, part-of-speech (POS) tagging is applied to the individual sentences. After this, the appearance and collocation information of the other POS tags is analyzed, excluding the entity candidates, such as nouns. Finally, an entity recognition model is created based on analyzing and classifying the information in the sentences.
키워드(Keyword)	Named entity recognition entity extraction data mining data cleaning sentence segmentation information extraction
파일첨부	PDF 다운로드